Overview
Phase 4 is the final enrichment stage where the base JSON from Phase 3 is modified in-place by 5 specialized scripts. Each script adds specific fields to complete the 86-field schema.
Order matters! Scripts must run sequentially:
advanced_metrics_processor.py (needs OHLCV)
process_earnings_performance.py (needs filings + OHLCV)
enrich_fno_data.py (needs external FNO data)
process_market_breadth.py (needs returns + SMA status)
add_corporate_events.py (MUST BE LAST — adds event markers + news)
Execution Order
Phase 4 runs 5 scripts sequentially :
Advanced Metrics Injection
advanced_metrics_processor.py — Injects ADR, RVOL, ATH, Turnover metrics
Earnings Performance Injection
process_earnings_performance.py — Injects post-earnings returns
F&O Data Injection
enrich_fno_data.py — Injects F&O flag, lot size, next expiry
Market Breadth Processing
process_market_breadth.py — Generates sector analytics and breadth metrics
Corporate Events Injection (FINAL)
add_corporate_events.py — Injects event markers, announcements, news feed
Script 1: advanced_metrics_processor.py
Purpose
Injects OHLCV-derived metrics: ADR, RVOL, ATH, Turnover, Volume EMA.
all_stocks_fundamental_analysis.json (Phase 3 output)
ohlcv_data/{SYMBOL}.csv (Phase 2.5 output)
complete_price_bands.json (Phase 2 output)
Processing Logic
Calculate ADR (Average Daily Range)
Calculate RVOL (Relative Volume)
Calculate ATH (All-Time High)
Calculate Turnover Metrics
Calculate Volume EMA
df = pd.read_csv( f "ohlcv_data/ { symbol } .csv" )
# Daily range percentage
df[ 'Daily_Range_Pct' ] = ((df[ 'High' ] - df[ 'Low' ]) / df[ 'Low' ]) * 100
# Moving averages of ADR
adr_5 = df[ 'Daily_Range_Pct' ].tail( 5 ).mean()
adr_14 = df[ 'Daily_Range_Pct' ].tail( 14 ).mean()
adr_20 = df[ 'Daily_Range_Pct' ].tail( 20 ).mean()
adr_30 = df[ 'Daily_Range_Pct' ].tail( 30 ).mean()
Fields Injected
Field Description Example RVOLRelative Volume (today vs 20D avg) 1.455 Days MA ADR(%)5-day average daily range 3.214 Days MA ADR(%)14-day average daily range 3.520 Days MA ADR(%)20-day average daily range 3.430 Days MA ADR(%)30-day average daily range 3.6% from ATHDistance from all-time high -12.5ATH_ValueAll-time high price 2850.00Gap Up %Today’s gap vs yesterday close 1.2Day Range(%)Today’s high-low spread 2.86 Month Returns(%)6-month price return 18.5% from 52W LowDistance from 52-week low 72.830 Days Average Rupee Volume(Cr.)30-day avg turnover 1250.5Daily Rupee Turnover 20(Cr.)20-day avg turnover 1180.2Daily Rupee Turnover 50(Cr.)50-day avg turnover 1120.8Daily Rupee Turnover 100(Cr.)100-day avg turnover 1050.3200 Days EMA Volume200-day EMA of volume 12500000% from 52W High 200 Days EMA VolumeVolume EMA trend -8.5
Threading
Workers: 10 concurrent threads
Typical Time: ~1-2 minutes (reading 2,775 CSV files)
Dependency on OHLCV
If FETCH_OHLCV = False in Phase 2.5, all these fields will remain 0 .
Purpose
Injects post-earnings price performance metrics.
all_stocks_fundamental_analysis.json (Phase 3 output, modified by Script 1)
company_filings/{SYMBOL}_filings.json (Phase 2 output)
ohlcv_data/{SYMBOL}.csv (Phase 2.5 output)
Processing Logic
Find Earnings Date from Filings
Calculate Returns Since Earnings
Calculate Max Returns Since Earnings
for filing in company_filings:
caption = filing.get( "caption" , "" ).lower()
if "quarterly" in caption and "results" in caption:
earnings_date = datetime.strptime(filing[ "news_date" ], "%Y-%m- %d " )
break
Fields Injected
Field Description Example Quarterly Results DateDate of latest earnings filing 2026-02-15Returns since Earnings(%)% change from pre-earnings close to current 8.5Max Returns since Earnings(%)Peak % gain since earnings 12.3
Typical Time
~2-3 minutes — Reading 2,775 filing JSONs + CSV lookups
Script 3: enrich_fno_data.py
Purpose
Injects F&O (Futures & Options) metadata: lot size, next expiry, F&O flag.
all_stocks_fundamental_analysis.json (Phase 3 output, modified by Scripts 1-2)
fno_lot_sizes_cleaned.json (External standalone script)
fno_expiry_calendar.json (External standalone script)
Processing Logic
Inject F&O Flag
Inject Lot Size
Inject Next Expiry
for stock in master_data:
symbol = stock[ "Symbol" ]
# Check if symbol is in F&O list
stock[ "Is FNO" ] = 1 if symbol in lot_map else 0
Fields Injected
Field Description Example Is FNO1 if F&O enabled, 0 otherwise 1FNO Lot SizeContract lot size 250Next ExpiryNext futures expiry date 2026-03-27
Typical Time
~10-20 seconds — Simple JSON lookups
Script 4: process_market_breadth.py
Purpose
Generates sector-level analytics and relative strength ratings.
all_stocks_fundamental_analysis.json (Phase 3 output, modified by Scripts 1-3)
Processing Logic
Calculate Sector Breadth
Calculate Relative Strength Rating (RSR)
sector_stats = {}
for stock in master_data:
sector = stock.get( "Sector" )
if sector not in sector_stats:
sector_stats[sector] = {
"above_sma_50" : 0 ,
"above_sma_200" : 0 ,
"total_stocks" : 0
}
sector_stats[sector][ "total_stocks" ] += 1
if "Above" in stock.get( "SMA Status" , "" ) and "SMA 50" in stock.get( "SMA Status" , "" ):
sector_stats[sector][ "above_sma_50" ] += 1
if "Above" in stock.get( "SMA Status" , "" ) and "SMA 200" in stock.get( "SMA Status" , "" ):
sector_stats[sector][ "above_sma_200" ] += 1
Output Files
File Description sector_analytics.jsonSector-level breadth metrics market_breadth.csvDaily market breadth snapshot
Typical Time
~20-30 seconds — In-memory calculations
Script 5: add_corporate_events.py (CRITICAL FINAL STEP)
Purpose
MUST BE LAST! Injects event markers, regulatory announcements, and news feed.
all_stocks_fundamental_analysis.json (Phase 3 output, modified by Scripts 1-4)
upcoming_corporate_actions.json (Phase 2 output)
company_filings/{SYMBOL}_filings.json (Phase 2 output)
market_news/{SYMBOL}_news.json (Phase 2 output)
nse_asm_list.json (Phase 2 output)
nse_gsm_list.json (Phase 2 output)
bulk_block_deals.json (Phase 2 output)
incremental_price_bands.json (Phase 2 output)
Event Marker Logic
1. Surveillance Markers (★)
2. Corporate Action Markers (⏰, 💸, 🎁, ✂️, 📈)
3. Filing-Based Markers (📊, 🔑)
4. Block Deal Marker (📦)
5. Circuit Limit Revision (#)
with open (asm_file, "r" ) as f:
asm_data = json.load(f)
for item in asm_data:
symbol = item.get( "Symbol" )
stage = item.get( "Stage" , "" )
if "LTASM" in stage:
add_event(symbol, "★: LTASM" )
elif "STASM" in stage:
add_event(symbol, "★: STASM" )
Announcements Injection
# Top 5 regulatory filings
filings = load_filings(symbol)[: 5 ]
announcements = []
for filing in filings:
announcements.append({
"Date" : filing.get( "news_date" ),
"Headline" : filing.get( "caption" ),
"URL" : filing.get( "pdf_url" )
})
stock[ "Recent Announcements" ] = announcements
News Feed Injection
# Top 5 news items with sentiment
news_items = load_news(symbol)[: 5 ]
news_feed = []
for news in news_items:
news_feed.append({
"Title" : news.get( "title" ),
"Sentiment" : news.get( "sentiment" ), # positive/negative/neutral
"Date" : news.get( "timestamp" )
})
stock[ "News Feed" ] = news_feed
Fields Injected
Field Description Example Event MarkersArray of event strings ["★: LTASM", "💸: Dividend (15-Mar)", "📦: Block Deal"]Recent AnnouncementsTop 5 regulatory filings [{"Date": "2026-02-15", "Headline": "Quarterly Results", "URL": "..."}]News FeedTop 5 news items [{"Title": "Stock hits 52W high", "Sentiment": "positive", "Date": "2026-03-01"}]
Event Marker Icons Reference
Icon Name Trigger Condition ★ Surveillance Stock in ASM/GSM lists 📊 Results Recently Out Results filed in last 7 days 🔑 Insider Trading SEBI Reg 7(2) / Form C in last 15 days 📦 Block Deal Bulk/Block deal in last 7 days # Circuit Revision Price band changed ⏰ Results Upcoming Results due in next 14 days 💸 Dividend Dividend ex-date in next 30 days 🎁 Bonus Bonus ex-date in next 30 days ✂️ Split Split ex-date in next 30 days 📈 Rights Rights issue in next 30 days
Typical Time
~3-5 minutes — Reading 2,775 filing JSONs + 2,775 news JSONs + event logic
Phase 4 Output Summary
Final Master JSON
📦 Phase 4 Final Output:
└─ all_stocks_fundamental_analysis.json (~55 MB, 2,775 records, 86 COMPLETE fields)
Field Completion Status
Before Phase 4 (Phase 3 Output)
After Phase 4 (Final)
✅ 60 fields populated (fundamentals, technicals, ratios)
❌ 26 fields placeholder (0 or empty arrays)
Total Phase 4 Execution Time
~6-10 minutes (sum of all 5 scripts)
Breakdown:
Script 1 (ADR/RVOL/ATH): ~2 min
Script 2 (Earnings): ~3 min
Script 3 (F&O): ~20 sec
Script 4 (Breadth): ~30 sec
Script 5 (Events): ~4 min
Critical Sequencing
Why order matters:
advanced_metrics_processor.py first — Needs raw OHLCV files
process_earnings_performance.py second — Needs filings + OHLCV
enrich_fno_data.py third — Independent lookup
process_market_breadth.py fourth — Needs returns + SMA status from previous scripts
add_corporate_events.py LAST — Adds final UI elements (markers, news)
Running out of order will cause missing data or overwrites.
Error Handling
Phase 4 uses soft failure mode:
results[ "advanced_metrics_processor.py" ] = run_script( "advanced_metrics_processor.py" , "Phase 4" )
# Pipeline continues even if enrichment fails
Impact of Failures
Script 1 fails: ADR, RVOL, ATH remain 0
Script 2 fails: Earnings performance fields remain null
Script 3 fails: F&O fields remain N/A
Script 4 fails: No sector analytics, no RSR
Script 5 fails: Event markers, announcements, news feed remain empty
Next Phase
Once Phase 4 completes, the pipeline proceeds to:
Pipeline Architecture See complete pipeline overview including Phase 5 compression details
Validation Checklist
After Phase 4, verify:
# 1. Check file size (should be ~55 MB)
ls -lh all_stocks_fundamental_analysis.json
# 2. Validate field count
jq '.[0] | keys | length' all_stocks_fundamental_analysis.json # Expected: 86
# 3. Check sample stock has all fields populated
jq '.[0]' all_stocks_fundamental_analysis.json | grep -E '(RVOL|Event Markers|Recent Announcements)'
# 4. Count stocks with event markers
jq '[.[] | select(."Event Markers" | length > 0)] | length' all_stocks_fundamental_analysis.json